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In this paper we present a theory of relational database systems based on 
the partition lattice, which represents a new mathematical approach to the 
structure of relational database systems. A partition lattice can be defined for 
any given relation. This partition lattice is shown to be a meet-morphic image 
of the Boolean algebra of subsets of the attribute set. The partial ordering in 
the lattice is proved to be equivalent to the concept of functional dependency, 
and thus Armstrong's axioms for functional dependencies are proved. We 
solve the problem of finding the list of all keys by seeking the prime implicants 
of the Boolean function associated with the principal ideals generated by the 
attributes. We demonstrate the properties of the Boyce-Codd Normal Form 
(BCNF), and give a modified algorithm for synthesizing an information- 
lossless BCNF based on the principal filter. The necessary and sufficient 
conditions for multivalued dependency (MVD) are given in terms of a lattice 
equation, and the inference rules of MVD are proved. The necessary and 
sufficient conditions for join dependency (JD) are given; consequently, we can 
prove the known result that acyclic join dependency (AJD) is equivalent to a 
set of MVDs. The concept of data independence is introduced, and is extended 
to conditional independence and mutual independence. We established this 
algebraic theory of relational databases in the same spirit that the theory of 
probability was constructed. We present a comparison that demonstrates the 
similarities. 

I. INTRODUCTION 

The existing theory of relational databases is based on Codd's 
relational model of data. 1,2 This relational database theory can be 
considered to be the study of data dependencies (or independencies). 
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The theory was initiated by Codd with the introduction of the concept 
of functional dependency; Codd observed that this concept can be used 
to design better, normalized, database schemes. The advantage of 
normalized database schemes is that they remove the possibility of 
updating anomalies caused by undesirable data dependencies. 2-5 

In the existing theory of logical database design, functional depend- 
encies are input constraints that must always hold in the relation. 6 In 
the present paper, however, we take a different approach. We assume 
that for a particular database designer, there exists a (finite) universal 
relation R[Q] for a given set of attributes 12, such that any relation T 
on 12 is a subset of R[Q]. Furthermore, each subset X of 12 corresponds 
to an equivalence relation (partition) on the set of tuples of R[Q]. 
That is, if two tuples in i?[12] have the same X value, then they are in 
the same equivalence class. With this approach, the concept of func- 
tional dependency becomes equivalent to the refinement partial or- 
dering of the partition lattice. The partitions on the (finite) set of 
tuples of the universal relation R [12] can then be considered as the 
fundamental constraints, from which the functional dependencies 
(partial ordering) can be derived. Consequently, with our approach, 
the functional dependencies are inherent properties of the universal 
relation #[12]. The input constraints of course must be consistent with 
the inherent properties within the database. 

Another kind of data dependency, proposed by Fagin 7 and Zaniolo, 8 
is the multivalued dependency, which includes functional dependency 
as a special case. Multivalued dependency is the necessary and suffi- 
cient condition for the lossless-join decomposition of a relation into 
two subrelations, such that the original relation can be regenerated by 
the (natural) join operation. 7-11 Using the partition lattice we propose, 
we can formulate multivalued dependency as a lattice equation (see 
Section VI). We show that the axioms for functional dependencies 12 
and the inference rules of multivalued dependencies 13 can all be proved 
as theorems within the framework of partition lattice theory. We show 
how the concept of join dependency 101114 is connected to multivalued 
dependency. We give the necessary and sufficient condition for join 
dependency and, consequently, we can prove the known result that 
the acyclic join dependency is equivalent to a set of MVDs. 1516 We 
also introduce the concept of data independence, and the extension to 
conditional independence and mutual independence of sets of attri- 
butes. 

The problem of listing all the keys of a relation is solved by using 
the concept of principal ideals in lattice theory. One form of a relation 
having desirable properties is the Boyce-Codd Normal Form (BCNF); 
we show that the concept of the principal filter (dual ideal) can be 
used to produce information-lossless Boyce-Codd Normal Forms. 
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Both the theoretical foundation and the practical application of the 
existing theory of relational databases appear to be fragmented. This 
paper shows that all the diverse kinds of data dependencies can be 
formulated within the lattice theory, which has the important advan- 
tage of unifying the theory of relational databases into a coordinated 
whole. Because of this, it would appear that future work in relational 
databases should be conducted using lattice theory as the basic frame- 
work. 

The establishment of this algebraic theory of relational databases is 
done in the same spirit as the construction of the theory of probability, 
although probability theory is of course unrelated to database theory. 
We are convinced that the lattice theory could play a role in the theory 
of relational databases similar to the role that measure theory plays 
in the theory of probability. 17 

The basic notion of relational databases is defined in Section II, 
and the partition lattice of the relation is introduced in Section III. 
The problem of listing all keys is solved in Section IV, where the 
Boolean functions associated with the principal ideals are defined. 
The properties of the Boyce-Codd Normal Form are studied in Section 
V, where we present a modified algorithm for synthesizing informa- 
tion-lossless BCNFs based on the principal filters. Section VI is 
devoted to the proof of equivalence between multivalued dependency 
and a lattice equation. Section VII discusses join dependency and 
acyclic join dependency. Finally, in Section VIII we outline a possible 
direction for future research, as well as a comparison that shows the 
similarities between probability theory and the algebraic theory of 
relational databases. In Appendix A we list the laws of lattice theory 
for reference. The proofs of the axioms for functional and multivalued 
dependencies are listed in Appendix B. 

Unless otherwise stated, we refer to the universal relation as simply 
"the relation" in the remainder of this paper. 

II. RELATIONS 

An attribute is a symbol taken from a finite set $2 = \A\, A 2 , • • • , 
A n \. For each attribute A there is a set of possible values called its 
domain, denoted DOM(A). We will use capital letters from the begin- 
ning of the alphabet (A, B, ■ • • ) for single attributes, and capital letters 
from the end of the alphabet (X, Y, • • • ) for sets of attributes. For a 
set of attributes X C Q, an X-value x is an assignment of values to the 
attributes of X = jA,-„ A 2i , • • • , a, A } from their respective domains. 
The notation XY will be used to represent the union of two arbitrary 
sets of attributes X, Y Q Q. 

A relation R on the set of attributes ft = {Ai, • • • , A n \ is a subset of 
the Cartesian product DOM(Ai) x . • . x DOM(A n ). The elements 
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(rows) of R are called tuples. A relation R on \A U • • • , A n ) will be 
denoted by R[A t • • • A n ]. Similarly, if R is defined on the union of sets 
(Xi, X 2 , • • • , XJ, then the notation R[X X • • • X m ] will be used. A 
relation can be visualized as a table whose columns are labeled with 
attributes and whose rows depict tuples. The ordering of the rows and 
columns is immaterial. The cardinal of R is the total number of tuples 
in R and is denoted by \R\. 

Let t be a tuple in R[Q]. For X C Q, t[X] denotes the tuple that 
contains the components of t corresponding to the attributes of X. 
The projection of R on X, denoted by R[X], is defined as follows: 

R[X] = {t[X]\tER). 

Similarly, the conditional projection of R on X by a Y-value y, where 
Y C fi, is defined as follows: 

R y [X] = \t[X]\tGR and t[Y]=y\. 

Let R[XZ] and S[XZ] be relations where X, Y, and Z, are disjoint 
sets of attributes. The join (natural join) of R and S, denoted by 
R | X | S, is the relation T[XYZ] whose attributes are XYZ, and is 
defined as follows: 

T[XYZ) = R[XZ]\x\S[YZ] 

= \(x,y,z)\(x,z)(=R and (y, z) G S\. 

The join can also be defined as the union of a collection of Cartesian 
products: 

T[XYZ] = R[XZ)\x\S[YZ] 

= mx) x S Z [Y] x (z) | (z) e R[Z] n S[Z)}. 

Let R be a relation on the set of attributes fi. We may have two sets 
of attributes X, Y, C Q, such that for any two tuples t x , t 2 G R, ti[X] 
= t 2 [X] implies h[Y] = t 2 [Y]. We say then that X functionally deter- 
mines Y in R, and denote this fact by X -► Y. A functional dependency 
(FD) X -> Y is triuia/, meaning it holds in all relations, if Y C X Note 
that FDs enjoy the projectivity and inverse projectivity properties. 3,4 
For sets X, Y C fi' C fi, the FD: X -» Y is valid in R[Q] iff it is valid 

inR[Q']. 

We say that a set of relations [R[Sli], ■■■ , R[Vn]\ has the information- 
lossless join property if fl — Qi • • • B and 

R[Q] = ff[Oi] | X | ... | X | R[8,»]. 

If the set [R[ili], ■ • • , R[&n]} does not have this property, we say that 
it has a lossy join. 14 An important property of functional dependency'" 
is that if FD: X -> Y is valid in R[U] then 
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R[Q] = R[(Q - Y)X] | x | R[XY]. 
This property will be discussed in more detail in Section VI. 

III. THE RELATION LATTICE 

If S is a nonempty set, then a subset p of S X S is called a binary 
relation on S. The product of two binary relations p, p' C S X S is 
defined as: 

p»/)' = {(a, 6) G S X S | 3c G S such that (a, c) G p, (c, b) G p'\. 

We say that a relation p on S is reflexive if (a, a) G p for every a in S; 
that p is symmetric if p _1 = p, i.e., if 

(Va, 6 G S), (a, 6) G p implies (6, a) G p; 

and that p is transitive if p ° p Q p, i.e., if 

(Va, 6, c G 5), (a, fe) G p and (6, c) G p imply (a, c) G p. 

A binary relation is called an equivalence relation if it is reflexive, 
symmetric, and transitive. 

A family ir = {B,-| i 6 /} of subsets, called blocks of S, is said to form 
a partition of S if the following conditions hold: 

1. Each B, is nonempty 

2. For all i * j in /, J3, n By ■ 

3. U{B,-|iG/| = S. 

The two apparently different notions of "equivalence relation" and 
"partition" are interchangeable: Let p be an equivalence relation on a 
set S. Then the family a p = \b \ (a, b) G p) of subsets of S is a partition 
of S. Conversely, if it = jB,-| i G /} is a partition of S, then the relation 
{(a, 6) | (Bi G /), (a, b) G B,-} is an equivalence relation on S. 

If p is an equivalence relation (partition) on S, we shall sometimes 
write apb as an alternative to (a, 6) G p. The sets a p that form the 
associated partition of the equivalence relation are called p-classes. 
The set of p-classes is called the quotient set of S by p and is denoted 

byS/p. 

A binary relation ^ on the set S is a partial ordering of S if and only 
if ^ is reflexive; antisymmetric, i.e., if 

(Va, bE.S), a^b and 6^ a imply a = 6; 

and transitive. A set S with a partial ordering ^ is called a partially 
ordered set (poset) and it is denoted by the pair (S, ^). 

Let {S, ^) be a poset and let T be a subset of S. Then, a G S is the 
greatest lower bound (g.l.b.) of T iff 

1. (VtGT), a^t. 
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2. ( Vt G T), a' ^ t implies a' ^ a. 
Similarly, a E 5 is the least upper bound (l.u.b.) of T iff 

1. (VtET),t^a. 

2. ( Vt e T), t ^ a' implies a ^ a'. 

A lattice is a poset in which any two elements a and b have a g.l.b., 
called a meet and denoted bya-b, and a l.u.b., called a join and denoted 
by a + b. We sometimes write the meet a- b as ab if no confusion is 
created. The properties of the meet and join operations of a lattice 18 
are listed in Appendix A. 

Let the set of all partitions tt« on S be denoted by U(S), and define 
the partial ordering on U(S) as follows: 

If(Va, bES), airib implies air 2 b, then tti ^ 7r 2 . 

The poset (U(S), ^) is seen to be a lattice (\\{S), •, +) with a 
universal lower bound - \Bi\i E I] such that every block B, is a 
singleton, and an universal upper bound 1 = \S\. To specify a partic- 
ular partition, we list the elements, and distinguish blocks with bars 
and semicolons. For example, if S = {1, 2, 3, 4, 5| and par tition tt on 
S has blocks {1, 3, 4}, {2, 5}, then we write -k = {1, 3, 4; 2, 5}. The meet 
and join of any two partitions n, tt 2 E U(S) can be determined as 
follows: 

1. ( Va, b E S), air i • n 2 b iff airib and air 2 b. 

2. (Va,bG S), otti + ir 2 b iff 3n E N and c , ■ • • , c„ G S such that 
a = c ,b = c n and CflTid+i or c.^c.+i for each i, s£ i «£ n - 1. 

A complemented distributive lattice is called a Boolean algebra (see 
Appendix A). The set of all subsets of S, called the power set of S, and 
denoted by 2 s , with the partial ordering ( VSi, S 2 E 2 s ), Si ^ S 2 iff Si 
D S 2 , is a Boolean algebra (2 s , •,+,") with the universal bounds = 
S and 1 = 0. The dual of a poset is the poset with the converse partial 
ordering on the same elements. The Boolean algebra defined above is 
the dual of the conventional Boolean algebra of the power set. The 
operations of meet and join are defined by 

1. Meet (g.l.b.) Si-S 2 = Si US 2 , 

2. Join (l.u.b.) Si + S 2 = Si n S 2 , 

and the complement of Si E 2 s is Si = S — Si. 

Let \f/: L -► M be a function from a lattice L into a lattice M. We 
say \p is a meet-morphism if 

(Va,bEL), \ls{a-b) = Ma)-t(b), 

and \p is a join-morphism if 

(Va,b<= L), yp{a + b) = \fs(a) + t(b). 

Meet-morphisms and join-morphisms are both isotone (order-preserv- 
ing); i.e., 
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(Va, bEL), a^b implies ^(a) ^ 1^(6), 

and any order-preserving one-to-one mapping with an inverse is an 
isomorphism. 18 

Let i? be a relation on the set of attributes ft. The set of all subsets 
of ft, denoted 2 n , with the partial ordering defined by set-containment, 
is a Boolean algebra (2°, • , +, ~), 18 where the meet, join, and comple- 
ment operations are defined as above. For every X G 2", there is an 
equivalence relation (partition) on the set of tuples in i2[ft] defined as 
follows: 

Definition 1: Let R be a relation on the set of attributes ft. Each subset 
of ft is associated with a partition of the set of tuples of R. We define 
the function 0: 2 n — » IICft[fi])» which we call the partition function 
(associated with i?[ft]), by 

6:X — 0(X) = \(t u t 2 ) G R[Q] X R[Q] \ h[X] = t 2 [X)\. ■ 

In general, the image set Im(6) of 6 is not a sublattice of n (#[^1)- 
Since ir u ir 2 G lm(0) implies w\'W% G Im(9), lm(0) is a complete lattice 
in its own right, 20 and it will be called the relation lattice of R[Q], and 
denoted by L(R[Q]). Note that there are no duplicated tuples in R[Q], 
so that 0(ft) = 0. Since the tuples cannot be "differentiated" by the 
empty set of attributes, we define 0(0) = 1. The universal bounds of 
L(i?[ft]) are the same as those in ni^)^])- We immediately recognize 
the concept of functional dependency to be equivalent to the refine- 
ment partial ordering of the partitions. 

Lemma 1: Let R[Sl] be a relation on the set of attributes ft, and let 6: 2° 
— > n (#[^]) be the partition function associated with R[U], defined above. 
Then 

X->Y iff d(X)£0(Y). ■ 

An immediate consequence of the above lemma is that the projection 
R[X] of R[Q] on X is simply the quotient of R[il] by 0(X), i.e., R[X] = 
R[U]/8(X). Thus each tuple in R[X] corresponds to a 6(x) -class in 
R[Q]/e(X) and it takes the X-value only. Note that 6{X) = d(Y) does 
not imply R[X] = R[Y] because the attributes X and Y may have 
different sets of values. 

Theorem 1: Let Rbea relation on the set of attributes ft, and let L(R[Q]) 
be the relation lattice of R. Then the partition function 0:2°—* L(R[Q]) 
is a meet-morphism. 
Proof: We want to show that 

6(XY) = 0(X)0(Y), VX.'YG 2 n . 

Suppose tid(XY)t 2 . Then, t^XY] = t 2 [XY], which implies 

t 1 [X] = t 2 [X] and t 1 [Y] = t 2 [Y]. 
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Hence, 

tiO(X)t2 and hQ(Y)t 2 . 
By the definition of the meet operation, we have 

t 1 d(X)e(Y)t 2 , 

so that 

0(XY)^0(X)0(Y). 

Suppose t l B{X)d(Y)t 2 . Then, 

tMX)t 2 and tMY)t 2 , 

so that 

and thus 
Consequently, 

so that 

Hence 

6(XY) = 6(X)d(Y). ■ 

Note that the partition function is order-preserving, but it is in 
general not a join-morphism.* However, if d(X + Y) = 6(X) + 0(7) 
holds in L[R], the pair (X, Y) has a special property in the relation. 
This is discussed further in Section VI. 

It is clear now that Armstrong's axioms for functional dependencies 
become theorems within the framework of lattice theory. The proofs 
of the axioms for functional dependencies are given in Appendix B. 

Let R be a relation on the set of attributes fi, and let 0: 2" -»• L[R(Q)] 
be the partition function associated with R[U], Then the relation 
° _1 on 2 n defined by 

0^-' = \(X, Y) G 2" x 2 n \6(X) = 6(Y)\ 
is obviously an equivalence relation. Sets in the quotient set 2 n /0 • _1 
will be called classes. 



ti[X] = t 2 [X] and tAY] = t 2 [Y], 

ti[XY] = t 2 [XY]. 

t 1 0(XY)t 2t 

6(X)0(Y)£O(XY). 



* The join of 7r, and w 2 in L(R[il]) may be different from their join in n(#[ n P- We 
will use the notation ir, © tt 2 to denote the join of iri and ir 2 in n( fi [ n D. while ir\ + ■%% 
will denote the join of tt, and tt 2 in L (R\Q]): e.g.. in Exam ple 1 below, 0(E) @ 0{S) = \1, 
2, 3, 4, 5, 6; 7, 8), and B{E) + 0(S) = jl, 2, 3, 4, 5, 6, 7, 8). 
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Table I— Relation R[fC5V] 





Employee 


Child 


Salary 


Year 


1 


Hilbert 


Hubert 


$35K 


1975 


2 


HUbert 


Hubert 


$40K 


1976 


3 


Gauss 


Gwendolyn 


$40K 


1975 


4 


Gauss 


Gwendolyn 


$50K 


1976 


5 


Gauss 


Greta 


$40K 


1975 


6 


Gauss 


Greta 


$50K 


1976 


7 


Pythagoras 


Peter 


$15K 


1975 


8 


Pythagoras 


Peter 


$20K 


1976 



Example 1: Consider the relation R in Table I (see Ref. 7). Let ft = 
\E, C, S, Y} be the set of attributes, where E = employee, C = child, 
S = salary, Y = year. Then 

2" = {0, E, C, S, Y, EC, ES, EY, CS, CY, SY, ECS, 
ECY, ESY, CYS, CESY], 

and 



0(0 = {1, 2, 3, 4, 5, 6, 7, 8} = 1, 



9(E) = \1, 2; 3, 4, 5, 6; 7, 8} = n, 

0(C) = d(EC) = {172; 374; 576; 778} = tt 2 , 



d(S) = {1; 2, 3, 5; 4, 6; 7; 8} = tt 3 , 



0(Y) = \1, 3, 5, 7; 2, 4, 6, 8) = tt 4 , 
d(ES) = d(EY) = 0(SY) = d(ESY) 
= |1; 2; 375; 476; 7; 8) = ir 5 , 
d(CS) = 6(CY) = 6(ECY) = 6(ECS) = d(CSY) = 8(ECSY) 
= {1; 2; 3; 4; 5; 6; 7; 8} = 0. 
The Hasse diagram 18,21 of the relation lattice is illustrated in Fig. 1. ■ 

IV. LIST OF KEYS 

Let R be a relation on the set of attributes ft. We say that X C ft is 
a superkey of R if X — > A, VA G ft. If X is a superkey and no proper 
subset of X is a superkey, X is said to be a key of i?. 1,2,5 
Lemma 2: X C ft is a superkey of R iff d(X) = 0. 
Proof: (Necessity) Let ft = \A U ■■■ , A n \, and X -> A„ VA, G ft. Then, 

0(X) ^ 0(A f ), VAi G ft. 

By the definition of the meet operation, we have 

0(X) =5 d(A 1 )6(A 2 ) • • • 6(A n ). 
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Fig. 1— Relation lattice L{R[il]). 

It follows from Theorem 1 that 

6(X) £ 6{A X A 2 • • • A n ) - 0(12) = 0. 

Hence, 

0(X) = 0. 

(Sufficiency) Suppose 9(X) = 0. Then 

0(X) =i 0(Ai), VA, G fl. 
Hence, 

X — A„ VA.-efl. ■ 

An ideal is a subset J of a lattice L with the properties 18 

1. aEJ,xEL, and x ^ a, imply x G J, 

2. a, 6 G J implies a + 6 G J. 

For every a G L, the subset of all elements "less than or equal to" a is 
evidently an ideal; it is called the principal ideal of L generated by a, 
and is denoted by (a], i.e., 

(a] = \xeL\x^a\. 
Definition 2: Let R be a relation on the set of attributes fi = jAi, 
• • • , A n \. For each A, G fl, J, = (0(A,)] is the principal ideal of the 
relation lattice L(R[Q]) generated by 0(A,). A Boolean function 

UAu • • • , A„) = S X 

defined on 2 n is the Boolean sum of all X G 2" such that 0(X) ^ 0(A,). 
We will call /, the principal ideal function (generated by A,-). ■ 

This function plays a role similar to the Boolean function used in 
Ref. 16. 
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Theorem 2: Let R be a relation on SI = \A U • • ■ , A n \. X Q Q is a 
superkey of R iff X is a product term in the expansion of the Boolean 
function 

F(Au • • • , A n ) = fl MAu • • ' , A„), 

where fi is the principal ideal function generated by A,. 

Proof: The Boolean function F(A U ■ • • , A n ) has the expansion 

FiA u • • , A n ) - ft fi = I X x • • • X n . 

We want to show that every term K = X\ • • • X n is a superkey. Since 
0(Xd 6 J, = (0(Ai)], it follows that 

0(Xi) ^ 0(Ad, l^i^n. 

From L6 in Appendix A, we have 

d(X 1 )d(X 2 ) ■ • • 9{X n ) ^ 6(A 1 )d(A 2 ) • • • d(A n ). 

It follows from Theorem 1 that 

HXA ■•Xn)* 0(A 1 ■ • • An) - ff(O) = 0, 

and thus 

^(X 1 X 2 • • • X n ) - 0. 

Hence, K = X1X2 ■ • • X n is a superkey of R. 
Conversely, suppose X is a superkey of Z2. Then 

X^A h VA.-GO. 
Thus, 

By the definition of the principal ideal «/,, we must have 

0(X) e Ji = (d(Ai)], l^i^n. 

It follows that X = X • • • X (n times) is a product term in the 
expansion of F(A U • • • , A n ). ■ 

It is natural to call F(A\, • • • , A n ) the /zey Boolean function of the 
relation R[A\ ■ ■ ■ A n ]. Since any key X is a superkey of R, X must be 
a product term of the key Boolean function F(A\, • • • , A n ). Since no 
proper subset of X is a superkey, then by the definition of the prime 
implicant of a Boolean function, 22 we have 

Corollary 1: Let R be a relation on the set of attributes Q = \A\ t • ■ • , 
A n \. X C fi is a key of R iff X is a prime implicant of the key Boolean 
function F(A U ■ ■ ■ , A n ). ■ 
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An attribute ASQ is prime in R[Q] if A is in any key of R; otherwise 
A is nonprime. A Q fi is a nonprime attribute if and only if the key 
Boolean function is independent of A. 

Theorem 3: Let R be a relation on fi. A £ fi is a nonprime attribute iff 
there exists X Q fi such that 

1. A $ X, X -> A, 

2. AZ -> X implies Z -> X. 

Proof: (Necessity) Let A E fi be a nonprime attribute, and let X be 
any key of R. Then 

A fi X, and X -> A. 

Suppose AZ -> X. Then 0(AZ) ^ 0(X) = 0. It follows that 6(AZ) = 
and thus AZ is a superkey; it contains a key K C AZ and A £ X. We 
have X C Z, so that 

6(Z) ^ d(K) = = 6(X). 

Hence, 

Z^X. 

(Sufficiency) Let fi = {Ai, • • • , A n ), n > 2. Assume there is an X = 
A 2 , • • • , A m , such that (1) X -> A h and (2) A a Z -+ X, implies Z -► X. 
We want to show that A x must be a nonprime attribute. The key 
Boolean function F(A U • • ■ , A„) of #[12] can be written in the form 

n 

F(Ai, • • • , A„) = II U = hh ■ • • /m/m+i •••//. 

i=l 
= (flf X f m+ l) ■■■ (flfxfn), 

where /* = / 2 • • • / m . For any product term Y in f x we have 

0(Y) ^ 0(X) £ 0(Ai). 
Therefore, Y must be a term in f\. It follows that f x has the form 

fi=fx + g 

for some Boolean function g. Since 6{A X Z) ^ 0(X) implies 0(Z) ^ 0(X), 
/x can be written in the form 

fx = Aih + h + p = h+p 

for some Boolean functions h andp which are independent of A\. Also, 
every ft, j = m + 1, • • • , n, «an be written in the form 

ft = he + fxe + q 

for some Boolean functions e and q, which are independent of A\. It 
follows that 
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hfxfj = fifxifie + fxe + q) 
= fxifie + fifxe + hq) 

= fxifie + hq) 

= hfxie + q) = {fx + g)fx(e + q) 

= fx(e + q) = (h+ p)(e + q). 

Since h, p, e, and q are all independent of A x , we know that fifxfj is 
independent of A x for ally = m + 1, • • • ,n. Clearly, no prime implicant 
of F(A U • • • , A n ) contains A u and therefore A x is a nonprime attri- 
bute. ■ 

Example 2: Consider the relation R in Example 1. To obtain the prime 
implicants of the key Boolean function F, we can first simplify each 
principal ideal function. The principal ideal functions of the relation 
R[ECSY] are 

fa = E + C + SY, 
fc = C, 

f s = S + EY+ CY, 
f Y =Y + ES + CE, 
and the key Boolean function is 

F(E, C, S, Y) = (E + C + SY)-C-(S + EY + CY) 
■ (Y + ES + CE) 
= CS + CY. 

The sets CS and CY are the keys, and E is the only nonprime 
attribute. ■ 

V. BOYCE-CODD NORMAL FORM 

Normalization is a logical database design process that can be viewed 
as the decomposition of a relation into a set of subrelations, such that 
the original relation can be regenerated by the joins of the subrelations. 
The purpose of decomposition is to separate the independent compo- 
nents into distinct relations, to avoid updating anomalies. 2 It is claimed 
in Ref. 4 that the Boyce-Codd Normal Form is one that is free of 
insertion and deletion anomalies. This section is devoted to the BCNF 
and its relation lattice. A modified algorithm for synthesizing an 
information-lossless BCNF 6 is included, based on the concept of the 
principal filter of the relation lattice. 

Recall that a functional dependency X — » Y is trivial if Y Q X. A 
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relation R[Q] is said to be in Boyce-Codd Normal Form if, for all 

nontrivial FDs X -> Y, X is a superkey. 2,4 

Definition 3: Relation R[Q) is in BCNF if X-> Y implies either 

1. X is a superkey, i.e., 6(X) = 0, 
or 

2. YQX. ■ 

If a relation is in BCNF, we will show that its relation lattice has 
some special properties. To analyze these properties we need the 
concept of the principal filter. 18 

An ideal of the dual of the lattice L is called a filter of L. A subset 
M of L is a filter of L if 

1. a G M, x G L, and x ^ a, imply x G M, 

2. a, b, G M implies a-b G M. 

For every a G L, the subset of all elements "greater than or equal to" 
a is a filter; it is called the principal filter of L generated by a, and is 
denoted by [a), i.e., 

[a) = \xGL\x^a}. 

If a and o are elements of a lattice L, where a < b, and there is no c 
G L such that a<c<b, then we say that a is covered by 6 (or 6 covers 
a). 18 An element that covers the universal lower bound of L is 
referred to as an atom of L. 18 

Definition 4: Let 12 be a relation on the set of attributes ft, and let ir 
be an atom of the relation lattice L(R[Q]). Let ft, = \A \ A G fi, 0(A) £ 
tt} C 12. Then the projection R[U T ] of R and fi T is called an atomic 
projection, and [71-) is called an atomic filter. ■ 

It is easy to verify that the relation lattice of the atomic projection 
R[Q X ] is isomorphic to the principal filter [tt) of L(R[Q]) generated by 

7T. 

Definition 5: Let R be a relation on the set of attributes fi, and let -k 
G L(R[ti\) be an atom. The principal filter [tt) of L(R[Q]) is called 
normal iff whenever X — ► Y is valid in the atomic projection R[Q„] 
then YQX; otherwise, it is called abnormal. ■ 
Lemma 3: A relation R[tt] is in BCNF iff every atomic filter of L(R[Q]) 
is normal. 

Proof: (Necessity) Trivial. 

(Sufficiency) Suppose X —> Y and X is not a superkey, i.e., 6(X) ¥= 
0. Then there must exist an atom w, such that 

< tt ^ 6(X) £ 6{Y). 

It follows that X, Y C fi T and that X —*■ Y is valid in the atomic 
projection /2[fi x ], which is assumed normal. Therefore Y C X. ■ 
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The join operation in the Boolean algebra (2", •,+,") is not always 
preserved by the mapping. But for a relation R[Q] in BCNF, if X, Y 
C fi and neither X nor Y is a superkey of R[U], then the join X + Y 
is preserved by 0. We have 

Corollary 2: If R[Q] is in BCNF, X, Y C Q, 0(X) + 0, and 6(Y) * 0, 
then 

0{X+ Y) = d(X) + 0(Y). 

Proof: Since X + Y C X and X + Y C Y, we have 

0(X) ^ 0(X + 7) and 6(Y) ^ 6(X + Y). 

By definition of the join operation, we have 

d(X) + 0(Y) ^ d(X+ Y). 

Suppose there is a Z C Q such that 

d(X) ^ 6(Z) and 0(Y) ^ d(Z). 

Given 0(X) * and 6(Y) * 0, we have 

Z C X and Z C Y. 

Thus, 

Z C X + Y, 

so that 

0(X + Y)^(9(Z). 

By the definition of least upper bound, we have 

6(X+ Y) = 0(X) + 0(Y). ■ 

The most important characteristic of the BCNF is given in the 
following theorem. 

Theorem 4: The relation R[Q] is in BCNF iff every atomic filter [ir) of 
L(i?[fi]) is isomorphic to the Boolean algebra {2" 1 , •, +, ~). 
Proof: (Necessity) Since [it) is a meet-morphic image of restricted to 
2 Ut , it is sufficient to show that is a one-to-one mapping on 2"\ Let 
X, YG 2°-, and 0(X) = 0(Y). It follows that 

0(X) = 0(Y - X)0(Y + X) ^ 0(Y - X), 

which implies 

X^Y-X. 

Since < ir ^ 6(X) and [t) is normal, we have 

Y - X C X. 
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Hence, 

ycx. 

Similarly, 

XQY. 

Therefore, X = Y, and 6 is a one-to-one mapping on 2 n \ 

(Sufficiency) Suppose X ^ Y is valid in R[Q r ]. Then 0(X) ^ B(Y). 
Since the inverse of an isomorphism is also order-preserving, it follows 
that X 2 Y. Therefore, [tt) is normal and R[Sl] is in BCNF. ■ 
The above theorem implies that if [it) is normal, the only key of 

R[tl w ] is r l {*) = ^- 

It is known that any relation has a lossless-join decomposition into 
Boyce-Codd Normal Form, and an algorithm for determining the 
decomposition is given in Ref . 6. We will show how the concept of the 
principal filter can be used to modify this algorithm. In the algorithm 
for synthesizing the Third Normal Form, 5 a concept similar to the 
principal filter is used implicitly by Bernstein when he partitions the 
functional dependencies (Step 2). Before describing the improved 
algorithm, we need the following: 

Lemma 4: Let R be a relation on ft. Let tt £ L(R[tt]) be an atom of the 
relation lattice, and let Kbea key of the atomic projection R[Q W ]. Then, 

R[Sl] = R[(U- Q,)K]\x\R[Q w ]. 

Proof: K C fi T and K -^ 0*. ■ 

The algorithm for determining the lossless-join decomposition into 
BCNF is simply to construct a sequence of decompositions D, = (fit* 
• • ■ , R m ) of R, each with lossless join: Initially, let D consist of R 
alone. If T[Q] is a relation in D it and T[ft] is not in BCNF, let it be an 
atom of L(T[Q]) for which the principal filter [x) is abnormal. Let K 

Table II— Relation R[MSPCNY] 





Model 


Serial 












Number 


Number 


Price 


Color 


Name 


Year 


1 


1234 


342 


13.25 


blue 


pot 


1974 


2 


1234 


347 


13.25 


red 


pot 


1974 


3 


1234 


410 


14.23 


red 


pot 


1975 


4 


14'65 


347 


9.45 


black 


pan 


1974 


5 


1465 


390 


9.82 


black 


pan 


1976 


6 


1465 


392 


9.82 


red 


pan 


1976 


7 


1465 


401 


9.82 


red 


pan 


1976 


8 


1465 


409 


9.82 


blue 


pan 


1976 


9 


1623 


311 


22.34 


blue 


kettle 


1973 


10 


1623 


390 


30.21 


blue 


kettle 


1976 


11 


1623 


410 


28.55 


black 


kettle 


1975 


12 


1623 


423 


28.55 


black 


kettle 


1975 


1,'i 


1623 


428 


28.55 


blue 


kettle 


1975 


14 


1654 


435 


28.55 


red 


kettle 


1975 
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be a key of the atomic projection T[Q T ]. Now replace T[fi] in D, by 

T[fi - Q*)K] and T[S2 T ] to obtain A+i- Continue the process until all 

the relations in the decomposition D k are in BCNF. 

Example 3: Let us consider the relation R[MSPCNY] from Ref. 23, 

where M = model number, S = serial number, P = price, C = color, N 

= name, and Y = year. The tuples of the relation R[MSPCNY] are 

shown in Table II. 

The Hasse diagram of the relation lattice L(i?[fi]) is illustrated in Fig. 

2, where 



ITi = 

7T 2 = 

7T 3 = 

1T 4 = 

7T 5 = 

7T 6 = 

7T 7 = 

7T 8 = 

7T 9 = 



1, 8, 9, 10, 13; 2, 3, 6, 7, 14; 4, 5, 11, 12}, 



1, 2, 3; 4, 5, 6, 7, 8; 9, 10, 11, 12, 13, 14}, 



1, 2, 4; 3, 11, 12, 13, 14; 5, 6, 7, 8, 10; 9}, 



1, 2, 3; 4, 5, 6, 7, 8; 9, 10, 11, 12, 13; 14}, 



1, 2; 3; 4; 5, 6, 7, 8; 9; 10; 11, 12, 13, 14}, 



1; 2, 3; 4, 5; 6, 7; 8; 9, 10, 13; 11, 12; 14}, 



1, 2; 3; 4; 5, 6, 7, 8; 9; 10; 11, 12, 13; 14}, 



1; 2, 4; 3, 11; 5, 10; 6; 7; 8; 9; 12; 13; 14}, 



1; 2; 3; 4; 5; 6, 7; 8; 9; 10; 11, 12; 13; 14}, 




Fig. 2— Relation lattice L{R[MSPCNY]). 

RELATIONAL DATABASES 3175 



and 0(C) = w u 6(N) = tt 2 , 0(Y) = ir 3 , 6(M) = *<, d(P) = tt 6 , 6(S) = tt 8 . 
For XCfi, d{X) can be obtained easily by carrying out the meet 
operations on the attributes in X. 

The principal ideal functions of R[MSPCNY] are 

/c(M, S, P, C,N,Y) = C + MS + NS + PS, 

f N (M, S,P,C,N,Y) = N + M + P + CY + CS, 

f P (M, S,P,C,N,Y)=P + MY+CS + MS + NS, 

f M (M, S, P, C, N, Y) = M + CN + CP + CY + NS + PS + CS, 

MM, S,P,C,N,Y) = Y+P + S, 

fs(M, S, P, C, N, Y) = S, 

and the key Boolean function is 

F(M, S, P, C, N, Y) = (C + MS + NS + PS)-(N + M + P + CY 

+ CS) 

.(P + MY+CS + MS + NS) 

■ (M + CN + CD + CY + NS + PS + CS) 

■ (Y + P + S)-S 

= CS + MS + NS + PS. 

The keys of R[MSPCNY] are [CS, MS, NS, PS], and Y is the only 
nonprime attribute. 

Initially, let D = \R(MSPCNY)\. Since both atomic filters [tt 8 ) and 
[7r 9 ) are abnormal, we arbitrarily choose ir 9 , and let 2 = fi T9 = MPCNY. 
The relation lattice of i?[2] is isomorphic to [ir a ). The principal ideal 
functions of /?[2] are 

gc(M, P, C, N, Y) = f c (M, 0, P, C, N, Y) = C, 

gu(M, P, C, N, Y) = / N (M, 0, P, C, N, Y) = N + M + P + CY, 

g P (M, P, C, N, Y) = f P (M, 0, P, C,N,Y) = P + MY, 

g M (M, P, C,N,Y)= MM, 0, P, C,N,Y) = M+CN+CP+ CY, 

g Y (M, P, C,N,Y)= MM, 0, P, C,N,Y) = Y + P, 

and the key Boolean function is 

G(M, P,C,N,Y) = C-(N + M + P + CY)-(P + MY) 

■ (M + CN + CP + CY)-(Y + P) 

= CP + CMY. 
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We choose the key CP and replace R[MSPCNY] in D by R[(Q - X)K] 
= R[SPC] and R[l] = R[MPCNY] to obtain A = \R[SPC], 
R[MPCNY\\. The relation R[SPC] and its lattice are shown in Table 
III and Fig. 3, respectively. 

The relation R[SPC] is in BCNF, but the relation R[MPCNY\ is 
not. The relation lattice of R[MPCNY] is isomorphic to the filter [ir 9 ). 
We will not duplicate the figure. Both "atoms" 7r 6 and 7r 7 of [tq) are 
abnormal. We choose the filter [7r 7 ). The principal ideal functions of 
fi[2J = R[MPNY] are 

h M (M, P,N,Y) = g M (M, P, 0, N, Y) = M, 

MM, P, N, Y) = g N (M, P,0,N,Y) = N + M + P, 

h*(M, P, N, Y) = g P (M, P, 0,N,Y)=P + NY, 

h Y (M, P, N, Y) = g Y (M, P, 0,N,Y) = Y + P, 

and the key Boolean function of R[MPN Y] is given by 

H(M, P, N,Y) = M-(N + M + P)-(P + MY)-(Y + P) 

= MP + MY. 

We choose the key K' = MP and replace R[MPCNY] in D x by #[(2 
- X ri )K'] = R[MPC] and R[MPNY] to obtain D 2 = \R[SPC], R[MPC], 
R[MPNY]\. The relation R[MPC] and its relation lattice are illus- 
trated in Table IV and Fig. 4, respectively. 

Now we have to decompose the relation R[MPNY] in D2. The 
relation lattice of R[MPNY] is isomorphic to [tt 7 ) of L{R[MSPCNY]). 
We choose the abnormal filter that is isomorphic to [tt 5 ). Since S T5 = 
PNY and the only key is P, we can replace R[MPN Y] in D 2 by R[MP] 
and R[PNY] to obtain D 3 = \R[SPC], R[MPC], R[MP), R[PNY]\. All 
the relations in D z are in BCNF. The relations R[MP], R[PNY] and 





Table Ill- 


•Relation R[SPC] 




Serial 








Number 


Price 


Color 


1 


342 


13.25 


blue 


2 


347 


13.25 


red 


3 


410 


14.23 


red 


4 


347 


9.45 


black 


5 


390 


9.82 


black 


6 


392 


9.82 


red 


7 


401 


9.82 


red 


8 


409 


9.82 


blue 


9 


311 


22.34 


blue 


10 


390 


30.21 


blue 


11 


410 


28.55 


black 


12 


423 


28.55 


black 


13 


428 


28.55 


blue 


14 


435 


28.55 


red 
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Fig. 3— Relation lattice L(R[SPC]). 



Table IV— Relation R[MPC] 





Model 








Number 


Price 


Color 


1 


1234 


13.25 


blue 


2 


1234 


13.25 


red 


3 


1234 


14.23 


red 


4 


1465 


9.45 


black 


5 


1465 


9.82 


black 


(6,7) 


1465 


9.82 


red 


8 


1465 


9.82 


blue 


9 


1623 


22.34 


blue 


10 


1623 


30.21 


blue 


(11,12) 


1623 


28.55 


black 


13 


1623 


28.55 


blue 


14 


1654 


28.55 


red 




0' ) ir o '=0' 

Fig. 4— Relation lattice L(R[MPC]). 
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their respective lattices are shown in Tables V and VI, and Figs. 5 and 
6. ■ 

VI. MULTIVALUED DEPENDENCIES 

Multivalued dependency (MVD) proposed by Fagin 7 and Zaniolo 8 is 
the necessary and sufficient condition for a (binary) lossless-join 
decomposition. A similar concept, called hierarchical dependency, was 
defined by Delobel. 24 A bit later, the concept of multivalued depend- 
ency was generalized to join dependency by Rissanen. 10,11 A set of 
"axioms" or inference rules for multivalued dependencies was given 
by Beeri, Fagin, and Howard. 25 We know from our previous discussion 
that functional dependency is equivalent to partial ordering in the 
partition lattice. In this section we show that multivalued dependency 



Table V 


—Relation/? [MP] 






Model 








Number 


Price 




(1,2) 


1234 


13.25 




3 


1234 


14.23 




4 


1465 


9.45 




(5, 6, 7, 8) 


1465 


9.82 




9 


1623 


22.34 




10 


1623 


30.21 




(11, 12, 13) 


1623 


28.55 




14 


1654 


28.55 




Table VI- 


-Relation R[PNY] 






Price 


Name 


Year 


(1,2) 


13.25 


pot 


1974 


3 


14.23 


pot 


1975 


4 


9.45 


pan 


1974 


(5, 6, 7, 8) 


9.82 


pan 


1976 


9 


22.34 


kettle 


1973 


10 


30.21 


kettle 


1976 


(11, 12, 13, 14) 


28.55 


kettle 


1975 




Fig. 5— Relation lattice L(R[MP]). 
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7T C '"= 



Fig. 6— Relation lattice L(R[PNY]). 

is equivalent to a lattice equation. First, however, we state the defini- 
tion of MVD and show that MVD guarantees information-lossless join 
decomposition. 

Definition 5: Let R be a relation on the set of attributes ft = XYZ, 
where X, Y, and Z are disjoint subsets of Q. We say there is a 
multivalued dependency X > > Y if 

Rxz[Y] = R x [Y], V(x)ER[X], (z)ER[Z]. ■ 

Lemma 5: Let R be a relation on ft = XYZ, where X, Y, and Z are 
disjoint subsets. Then, 

R[XYZ] = R[XY] | x | R[XZ] 

iff 

R X [YZ] = | R X [Y] | • | RJZ] | , V(x) G R[X]. 

Proof: (Necessity) R[XYZ] = R[YX] \ X | R[XZ] implies 

R X [YZ] = R X [Y] X R X [Z], V(x) £ R[X]. 

Hence, 

l/yraii = iiyy]|-|/yz]|. 

(Sufficiency) It is easy to verify that 

R X [YZ] C R X [Y] X R X [Z], V(^) G H[X]. 

The given cardinal identity assures that 

R X [YZ] = R X [Y] x RJZ], V(jc) G R[X]. ■ 

Theorem 5: Let Rbea relation on the set of attributes ft = XYZ, where 
X, Y, and Z are disjoint subsets.* Then, 

R[XYZ] = R[XY] | x | R[XZ] iff X -►-► Y. 



* For convenience, we assume X, Y, and Z to be disjoint. It will later become clear 

that this assumption is not necessary. 
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Proof: (Necessity) From Lemma 5, it is sufficient to show that 
| R x [ YZ]\ = \ R x [ Y] | • | R X [Z] | , V(x) G R(X) 



iff 
Since 



R^Y] = R X [Y], V(x) G R(X), (z) G R(Z). 
R[XYZ] = R[XY] | x | R[XZ] 



implies 

RAY) X (x, z) = R X [Y] X (x, z), V(x) G fi[X], (e) G R[Z]. 
Hence, 

fi^V] - R*[Y]. 
(Sufficiency) For every (x) G R[X], we have 

(x) x R X [ZY] = \(x, Zi ) X R»,[Y]\(x, «) G R[XZ)\ 
= \(x, Zi ) x R x [Y]\(x, Zi ) G R[XZ]\ 

= (x) x r x [Z] x ly y]. 

Since | x | = 1, it follows that 

| R x [ YZ) | = | R x [ Y] | • | fl x [Z] | , V(x) G fl[X]. ■ 

We need the commutative property of the product of two equivalence 
relations (partitions) to establish the lattice equation of multivalued 
dependency. The product of two equivalence relations may not be an 
equivalence relation; if it is an equivalence relation then the product 
must be commutative and vice versa. 

Definition 6: Two binary relations p and p' and S are permutable 
(commute) if and only if p ° p' = p' ° p. This means that if a p x p' b 
for some x G S, then a p' y p b for some y G S, and conversely. 18 ■ 
Lemma 6: Let p and p' be equivalence relations {partitions) on S. Then 
the following are equivalent: 

1. p ° p' = p ° p 

2. p° p' =p®p' 

3. p ° p' is an equivalence relation 

4. p ° p' is symmetric. 

Proof: The proof of Lemma 6 is given in Ref. 21. ■ 

Lemma 7: Let R be a relation on the set of attributes fi and let X, Y, Z 

C Q. Then, 

0(X) = 9(XY) + 6(XZ) = 6(XY) ■ 6(XZ) = 6{XZ) - d(XY) 
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iff 

0(X) C 0(XY) • 0(XZ). 

Proof: (Necessity) Trivial. 

(Sufficiency) Suppose t x 0{XY) ' 0{XZ)t 2 . Then there exists t a E R[Q] 
such that 



which implies 

Therefore, 

Hence, 

It follows that 



t x 6{XT)h6{XZ)t 2 , 

tMX)t 3 d(X)t 2 . 

UB(X)t z . 
6(XY) ' 6(XZ) C 0(X). 



6(X) = 6(XY) ° 6(XZ). 

From Lemma 6, we have 

d(X) = d(XY) 6(XZ) = 0(XY) • 6(XZ) = 6(XZ) • 0(XY). 

Since 

d(XY) £ 0(XY) + 0{XZ) and 0(XZ) ^ B(XY) + d(XZ), 

by the definition of the join operation in n(^[ fi ])» we must have 

0(X) = 6{XY) 0(XZ) ^ 6(XY) + 0(XZ). 

But, 

0(XY) + 0(XZ) £ 0(X) + 0(X) = 0(X), 

so it follows that 

6(X) = d(XY) + 0(XZ) = 0(XY) • 0(XZ) = 0{XZ) ° 0(XY). ■ 

The following theorem shows that the multivalued dependency can 
be formulated as a lattice equation. 

Theorem 6: Let Rbea relation on the set of attributes ft = XYZ, where 
X, Y, and Z are disjoint subsets. Then, R[XYZ] = R[XY] \ X | R[XZ] 

iff 

0(X) = 0(XY) + 0(XZ) = 6(XY) ' 0(XZ) = 6(XZ) • 0(XY). 
Proof: (Necessity) Since R[XYZ] = R[XY] \ X | R[XZ] implies 

R X [YZ] = R X [Y] x R X [Z], V(x) G R[X], 
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there is a one-to-one and onto mapping X : R X [YZ] — * R X [Y] X R X [Z], 
which takes every tuple (y, z) G R X [YZ] into 0*((y, z)) = ((y), (z)) G 
R X [Y] x R Z [Z], V(x) e R[X]. Suppose t u t 2 G R[XYZ] and t x 6[X]t 2 , and 
assume £ x = (x, y ly Zi) and t 2 = (x, y 2 , z 2 ). As 

(yi, «i), (y 2 , 22) e iyyz] = «JY] x R X [Z], 

we have 

(*), (y 2 ) G R X [Y] and (zx), (z 2 ) G jyZ], 

Since 0* is an onto mapping, there must exist two tuples £3 = (x, y\, 
z 2 ), and t A = (x, y 2 , z x ) G R[XYZ]. Hence, 

M(xy)t 3 e(xz)t 2 , 

which means 

ti$(XY) • d(XZ)t 2 . 

It follows that 

0(X) C 0(XY) • (XZ). 

From Lemma 7, we have 

6(X) = 0(XY) + d(XZ) = 0(XY) • 0(XZ) = 6{XZ) • 0(XY). 

(Sufficiency) We know R[XYZ] C fl[XY] | X | R[XZ]. Suppose t = 
(x, y,z)ER [XY] I x I R[XZ]. Then there exist h = (x, y, z'), t 2 = (x, 
y', 2) G fl[XYZ]. Thus, 

ti6(X)t 2 , 

which implies 

t x 0{XY) • d(XZ)t 2 . 
There must exist * 3 G #[XYZ] such that 

ti0(XY)ta0(XZ)fe. 



Therefore, 
and thus 
Hence, 



t 3 = (*,y,z) = tei2[XYZ], 

R[XY] I x I #[XZ] C R[XYZ]. 
R[XYZ] = R[XY] I x I R[XZ]. I 



It should be noted that in the above proof we use the fact that Q = 
XYZ and d(Q) = d(XYZ) =?= 0, i.e., there are no duplicated tuples in 
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R[Q]. The inference rules of MVD are given and proved in Appendix 
B. 

Example 4: Consider the relation R[ECS Y] of Example 1. We have 
the MVD: E +> SY, where 6(E) = ir 1 = |1, 2; 3, 4, 5, 6; 7, 8}, 0(EC) 
= 7T 2 = |1, 2; 3, 4; 5, 6; 7, 8}, 0(ESY) = tt 5 = {1; 2; 3, 5; 4, 6; 7; 8}. It is 
easy to verify that 

7Ti = 7T2 + 7T5 = 7T2 ° 7T5 = T5 ° ""2- ■ 

It is known that if R is a relation on ft = XYZ, and X — >— » 7 then 
X — ►-> Z. The symmetricity of the MVD can easily be seen in the 
lattice equation of Lemma 7. 

If XYZ C ft and 6{X) = B(XY) + 6(XZ) = 0(XY) • 0(XZ) = 0(XZ) 
d(XY) holds, then X -*-* Y\Z is called an embedded multivalued 
dependency (EMVD) 7 ; this is simply a multivalued dependency in the 
projection R[XYZ] of R[Q], 

Theorem 6 clearly indicates that the MVD is actually a condition 
pertaining to data independency rather than data dependency. For this 
reason, we introduce the notion of decomposition of two sets of 
attributes in a relation as follows. 

Definition 7: Let R be a relation on the set of attributes ft. The two 
sets of attributes Qi, Qa £ fi are decomposable in R if 

ff{Qi + G 2 ) = 0(«i) + 0(Gz) = 0(Oi) - 0(ft 2 ) = 0(ft 2 ) ° 0(fii). ■ 

It is easy to see that ft x and ft 2 are decomposable in ft iff fti + ft 2 — » 
— » fti - ft 2 1 ft 2 - fti is an EMVD in R. Furthermore, if ftift 2 = ft then 
J2j + q 2 _»_> fix - ft 2 (or fti + ft 2 — >— » ft 2 - fti) is an MVD in R. In the 
latter case, (fti, ft 2 ) is called a decomposition pair by Armstrong and 
Delobel. 26 

We feel that decomposition is a basic concept in the study of the 
structure of databases. It can be naturally generalized to the concepts 
of projective decomposition and mutual decomposition. Projective 
composition concerns the data independence of two sets of attributes 
on the projection of a relation. Mutual decomposition extends the 
concept of decomposition to more than two sets of attributes. 

Let p be a partition on the set S, the function p* = S — » S/p maps 
a E S into (a)p* = a„ is called the canonical function of p. For S = [a, 
b, . . . , e], we will use the notation 

a b • • • e 
a p b p • • • e p 

to illustrate the canonical function p*. The equivalence relation 

ker p* = p* • p*" 1 = {(a, 6) E S X S|p*(a) = p*(b)}. 

is called the kernel of p*. Notice that ker p* = p. 
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P* = 



Let p and a be partitions on S = p ^ a; then there is a unique 
function / from S/p onto S/a such that (a p )f= a a . The kernal of/, 

ker / = / ■ r 1 = {(a„ 6 P ) G S/p xS/p\aa b\, 

is an equivalence on S/p. It is usual to write ker /as a/p, the quotient 
of a and p. Note that a p (a/p)b p if and only if a a b and the mapping g: 
(S/p)/(a/p) -» S/a defined by ((a p ) a/p )g = a a is one-to-one and onto. 
Thus the function / defined above is in fact the canonical function of 
a/p, i.e., / = (a/p)*. It is easy to see the diagram in Fig. 7 commutes, 
that is p* ■ (a/p)* = a*. 

Example 5: Let p, a be partitions on the set S = (1, 2 , 3, 4, 5, 6, 7, 8}, 
such that p = jl, 2; 3, 4; 5, 6; 7, 8} and a = {1^2, 3, 4; M>, 7, 8} with 
p ^ <r. Then S/p = {/, //, ///, IV], where I = 1, 2, 7/ = 3,4,7/7= 5, 6, 
7V= 7, 8, and S/a = {a, 0], where a = 1, 2, 3, 4, = 5, 6, 7, 8. The 
canonical functions of p and a are 



P* = 



and 



It follows that 



12345 6 7 8 
1 1 II II III III IV IV 



12 3 4 5 6 7 
a a a a (3 fil' 



and 



a/p = (7, 77; 777, IV] 
I II III IV 



(e/p)* = 



a a ' 



Lemma 8: Let p, a lf 02 be partitions on S such that p £ a u p ^ tr 2 , and 
ci° o"2 = 02 ° o\. Then 

(ai ° a 2 )/p = (ai/p) ° (a 2 /p). 



S a 




Fig. 7— Canonical function of quotient partition. 
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Proof: It is clear that a x ° o 2 is a partition on S and p ^ ^ ° a 2 - a 2 ° 

iri. It follows from the definition of quotient partition that the lemma 

is true. ■ 

Lemma 9: Let p, <n, cr 2 be partitions on S, such that p ^ a\, p = Ofc 

Then 

ffj ° 0"2 = 0"2 ° <Tl 

(«n/p) ° (2/p) = Mp) ° Wp)- 

Proo/: (Necessity) 

(<n/p) ° Wp) = (tfi ° oi)Ip = (^2 • ffi)/p = Mp) ° Wp). 
(Sufficiency) Suppose gkti ■ <r 2 b. Then there is a c G S such that 
ao\ca 2 b. It follows that 

a p (ai/p)c p ((T 2 /p)b p . 
There must be a d E S such that 

a p ((T 2 /p)d p (<Ti/p)b„. 

Thus 

and 

a<T2 ° <J\b. 
Hence 

Cl ° G 2 £ 0"2 ° Cl. 

Similarly, we have a 2 ° o\ C o\ ° 02. Then ar (T2 = "2 ° ci- B 
Definition 8: Let i? be a relation on the set of attributes fi. For Qi, fi 2 
C fi, the projective partition defined by 

d(Q 1 \n 2 ) = 6(Q 1 + Q 2 )/d(n 2 ) 

is a partition on the set of tuples of #[fl]/0(fl 2 ) = AM- The canonical 
function of 0(Qi\Q 2 ) is denoted by 0*(fli|fl 2 ) = (0(fli + %)/i(%))*i 
which satisfies 0*(fi 2 )°0*(«i|fi2) = 0*(fii + fi2). ■ 

Certain properties of the projective partition are demonstrated in 
the following theorems and their proof directly follows from the 
definition of projective partition. 

Theorem 7: Let R be a relation on the set of attributes fi and fii, • • • , 
fl„ C fi. Then 

e*(h oj = e*(Qi) • tf*(n 2 |Oi) ° • • • ° 0*(a»l Jn o»). ■ 
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Theorem 8: Let R be a relation on the set of attributes ft and X, Q lt 
• • • , Q n Q Q> such that 

U ft* = ft. 

77iera 

1. 0(X) = IIS-i ker(0*(ft*) • WIG*)) 

2. 0(ftJX) = ker(0*(ftj • 0*(X|ft m ))/n*=i ker(0*(ft fe ) • 
0*(X|ft*)). ■ 

Definition 9: Let R be a relation on the set of attributes ft and fti, ft 2 , 
2 C ft. We say ft x and ft 2 are projectively decomposable on 2 if 

0(ft x + ft 2 | 2) = 6{Sli\ 2) + 0(ft 2 | 2) 

= 0(ftx|2) °0(ft 2 |2) 

= 0(ft 2 |2) • 0(0x12). ■ 

The EMVD is a special case of projective decomposition, which can 
be seen from the following theorem. 

Theorem 9: Let R be a relation on ft, and let fti, ft 2 , 2 C ft. Then fti 
and ft 2 are projectively decomposable on 2 iff 

0(fti + ft 2 + 2) = 0(fti + 2) + 0(ft 2 + 2) 

= 0(ft, + 2) • 0(Q 2 + 2) = 0(ft 2 + 2) ° 0(«i + 2). 

Proof: The proof follows from Lemma 8 and 9. ■ 

Example 6: Consider the relation R on ft = ABCDE in Table VII. 

The Hasse diagram of the relation lattice L(R[Q]) is shown in Fig. 8, 

where 



7T, = {1, 2, 3, 5, 6, 7; 4} = 6(A), 



tt 2 = (1, 3, 4; 2, 5, 6, 7) = 0(5), 



7r 3 = (1,6, 7; 2, 3, 4, 5} = 0(C), 



7T 4 = |1, 3; 2, 5, 6, 7; 4} = 0(AB), 



7T 5 = {1, 6, 7; 2, 3, 5; 4) = 0(AC), 
7T 6 = (1; 275; 371; 6/7) = 0(BC), 



tt 7 = (1, 6, 7; 2, 5; 3; 4} = 0(D), 
7T 8 = f 173; 276; 4; 5/7) = 6(E), 
tt 9 = {I; 275; 3; 4; 677) = d(ABC) = d(ABD). 
Let ft! = ABD, ft 2 = ACE, 2 = ABC. We find that 
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Table VII— Relation lattice L(R[Q]) 





A 


B 


C 


D 


E 


1 


a\ 


6i 


Cl 


di 


ei 


2 


ax 


b 2 


c 2 


d 2 


e 2 


3 


ax 


bx 


c 2 


d 3 


Cl 


4 


Ch 


bx 


c 2 


d< 


e 3 


5 


ai 


b 2 


c 2 


d 2 


e< 


6 


a x 


b 2 


Cl 


dx 


e 2 


7 


a\ 


b2 


Cl 


di 


e* 




Fig. 8— Relation lattice L{R[Q)). 



0^ + Ikl 2) = d(A\ABC) = d(A)/d(ABC) = [I, II, III, V; IV), 

0(^1 2) = d(ABD\ABC) = 6(AB)/6(ABC) = \UH; HTV; TV], 
0(fl 2 |2) = 0(ACE\ABC) = d(AC)/d(ABC) = (TV; II, III; TV}, 



and 



d(ABC) = \I, II, III, IV, V), 



where /= 1, 11=2, 5, ///= 3, IV = 4, V= 6, 7. 

It is easy to see that ABD and ACE are protectively decomposable 
on ABC, i.e., 

0(A|ABC) = d(ABD\ABC) + 6(ACE\ABC) 
= 0(ABD\ABC) • 0(ACE|ABC) 
= 0(AC£|ABC) • 6(ABD\ABC). 
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But the MVD: A -»-» BD (or A ->-> CE) does not hold in R[Q\. 
Nevertheless, the EMVD: A ->B|C does hold in R[Q], ■ 

So far we have discussed the properties of decomposition of two sets 
of attributes. The concept of decomposition certainly can be extended 
to any n > 2 sets of attributes. We define the notion of mutual 
decomposition as follows: 

Definition 10: Let R be a relation on the set of attributes ft. The 
sets of attributes Q u I2 2 , • • • , Cl n QQ are mutually decomposable, if for 

any I Q N = \l, ■ ■ ■ , n\ and J C N - I, the two sets of attributes 
^/ = U,-e/ ft, and Qj = U, G j ft; are decomposable. ■ 
Theorem 10: Let R be a relation on the set of attributes Q and tti • • • 
tt n = fi. Suppose fii, • • • , f2„ are mutually decomposable. Then 

R[Q] = R[Qx . . . li„] - B[Oj] | X | ... | X\R[Q n ]. 

Proof: It follows from the definition of mutual decomposition that 

fi[Oi •••««] = fi[Oi • • • On-i] I X|i2[Q ra ], m - 2, ... it. 

Therefore the assertion is true by induction. ■ 

The above theorem states that mutual decomposition implies an 
information-lossless join. The converse is not true in general. The 
necessary and sufficient condition of an information-lossless join is 
called join dependency, which will be discussed in the next section. 

VII. JOIN DEPENDENCIES 

Join dependency (JD) 101114 is a generalization of MVD. It refers to 
a collection (Qi, • • • , Q n } of subsets of ft such that 

ft = ft! • • • ft n 

and 

R[Q] = R[Ut] | X | ... \x\R[Q n ]. 

Join dependency can be considered as a "set of coordinates" of the 
relation. The connection between join dependencies and multivalued 
dependencies is given by the following lemma: 

Lemma 10: Let R[V] = R[n x ] | X | ... | x | R[Q n ], let N be a subset of 
|1, • • • , n\, and let Ni - jl, • ■ • , n\ - N . Then (ft No , Q Nl ) is a 
decomposition pair, where 

ft = fti • • • ft„, n No = U ft,, and ft*, = U ft,. 

«'eJVn i^N, 



Proof: Since 



R\ U ft, 



C IXI «[«,], 
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and 



it follows that 



R 



U ft, 



C Ixl R[Q t ], 



fl[ft/v ] I x | R[il Nl ] c ixl #[ft,] I x I ixi *[«.-] 

Since the natural join operation is commutative and associative, 6 we 
have 

R[Sl No ] | x | R[Q Nl ) C ^[fia] | x | ... | x | R[Q n ] = R(Q). 

But we know 

R[n]QR[n No ]\x\R[a Ni ]. 

Hence, 

/2[fi] = /2[fi N J|x|/?[n A , l ]. ■ 

Let x be an X-value, and assume Y C X. We shall denote the 
Y- value in x as x[Y]. Let £ £ #[ft] be a tuple and let ft = fti • • • ft„. 
The notation t 4 (ufa • • • , w„) will be used to indicate that t[ft,] = Wj, 
Vi E N, where N = {1, • • • , n) denotes the index set. 

Before we state the necessary and sufficient conditions for join 
dependency, we first introduce the concepts of a set of consistent values 
and an indexed family of tuples. 

Definition 11: Let R be a relation on the set of attributes ft, and let 
{X, | i S JV} be a collection of subsets of ft. The set of values {x,- | x* is 
an Xj-value, i G N\ is called a set of consistent values of \X t \ i G N} if 
the values of X, H X, in x, and Xj agree, i.e., if 

x,{X, n xj\ = Xj [Xi n Xj\, Vi, ; e ;v. 

The set of tuples \t { \i G N] of tf [ft] is called an indexed family of 
tuples with respect to {X,| i G N\ if {x,| t&Xt] = x h i G N\ is a set of 
consistent values. ■ 

Theorem 11: Let R be a relation on the set of attributes ft, and let ft = 
ftx • • • ft n . 77w?n 

R[Q] = #[fti] | X | ••• |x|fl[ft„] 

iff for every indexed family of tuples [tt\ i S N) with respect to (X,| i G 
N\ there is a tuple t G fl[ft] such that £[ft,] = £,[ft,], Vi G AT, where X,- = 
ft, n ft,, and ft, = Uj>,fl/. 

Proo/: (Necessity) Let |fc|i G N] be an indexed family of tuples of 
R[U) with respect to {X,| i G N\. Thus, {x,| £,[X,] = x,-, i G N} is a set 
of consistent values. Suppose t,[ft,] = w h i G N. We want to show that 
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there exists a tuple t 4 ( Wu ... t w n ) S R[Q], We will prove this by 
mathematical induction. We know that Si = {wi) G i?[Q x ] and (w 2 ) G 
R[U 2 ]. Thus, 

h>i[Xi] = Xi and w 2 [X 2 ] = x 2 . 

Since (x,| i G N] is consistent, it follows that 

w 1 [X 1 n x 2 ] = x 1 [x 1 n x 2 ] = x 2 [X x n x 2 ] = w 2 [X x n x 2 ]. 

It is known that 

Xi n Xj = (Qi n hi) n (a, n n,) = n, n Q/, i * ;. 

Therefore, 

By the definition of natural join, we know that there exists a tuple 

S 2 A( U ; 1)U ; 2 )Gi?[Q 1 ]|X|i?[fi 2 ]. 

Suppose there is a tuple s„_i 4 (u/i, • • ■ , w n -i) G i2[fii] | X | • • • 
|x|/2[n n _!]. Then 

ib-iPKi n x„] = u;,[X, n x n ] = Xi[Xi n X B ] 

= x n [Xi n X n ] = w n [Xi nx n ], i = 1, • • ■ , n - 1. 
Hence, 

s„_![«„ n q„] = s n _![fi n n (fli u • • ■ u o,_i)] 

= s„_i[(Oi n fi„) u • • • u (o„-, n nj] 
= s^KX, n x n ) u . . . u (x n -x n xj] 
= wJQb n x») u • • • u (X n _! n x B )] 
= «;„[«„ n ftj. 

It follows that there exists a tuple t such that 

t = s n 4 (»!, . . . , w n ) € #[«i] | x | ... | x |fl[0„] = fl[0]. 

(Sufficiency) We know that 

«[0]CJ2[QJ|X| ... |X|fl[fl„]. 

For any £ 4 (u, lf . . . , u,„) e fl[fii] | X | ... | x | R[fl B ], there exists an 
indexed family of tuples [t t \ t,-[ft,-] = u;,, i G N} of #[fl] with respect to 
\Xi\i G AT J that has a set of consistent values {x, | w,[X,] = x„ iG N\. 
It follows that £ 4 (w lt • • • , w n ) G A[Q]. Hence, 

fl[o]-«[Oi]|x| ... |X|jqaj. ■ 

The necessary and sufficient conditions for JD given in the above 
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theorem are similar to the notion of template dependency introduced 
by Sadri and Ullman. 27 The following condition can be considered as 
an extension of the binary natural join operation. 
Corollary 3: Let R be a relation on the set of attributes ft, and ft = fti 

• ■• ft„. Then, 

fl[ft] = fl[fti]|x| ••• \x\R[n n ] 
iff 

«*»..Jfll = JUnd I x i • • ' I x I *uxj 

for every set of consistent values \%i \ i E N\ of {X, | i E N\, where X t = 
ft, U ft,, Vi E N, and R Xw . .^,[12] - \t \t E R[Q], t[X,] = x„ Vi E N}. 
Proo/: The proof follows from Theorem 11. ■ 

Clearly, for any t E R[Q] = #[fti • • • B„], the set of values («| t[Xi] 
= x„ X, = ft, fl ft,-, i E. N] is always consistent. The converse is not 
necessarily true. Suppose for any set of consistent values {*, | z, is an 
X,- value, i£N\ there is a tuple t E R[tt] such that t[X { ] = x„ Vi E AT; 
in this case we say {ft,| i G iV} is complete. 
Corollary 4: Let R be a relation on the set of attributes ft, and ft = fti 

• • • ft„. Then {ft,| i E AT} is complete iff 

R[Xx •••X„] = fl[X 1 ]|x| ... |X|fl[X„], 

where X, = ft, H ft,, i E N. 

Proof: The proof follows directly from Theorem 11. ■ 

The necessary and sufficient conditions for JD may be stated in a 
different form, as follows: 
Theorem 12: Let R be a relation on the set of attributes ft, and ft = fti 

• • • ft„. Then 

fl[Q] = i2[fli]|x| ••■ |x|i?[ft„] 

iff 

1. {ft,-, hi} is a decomposition pair, i E N, 

2. [ft; | i E N] is complete, i.e., 

RM ... x B ]-i?[Xii|x| ... \x\R[x n ], x 1 = fi l na, ieN. 

Proof: (Necessity) Condition 1 follows from Lemma 10. Condition 2 is 
a consequence of Theorem 11. 
(Sufficiency) We know that 

R[Q]QR[Qi]\X\ ... |X|2Z[QJ. 

Suppose t 4 (w u • • ■ , w n ) E i2[fti] | X | • • • | X | J?[ft„]. Then there is 
an indexed family of tuples |4|tfQf] = u>t, i E N\ of i?[ft] with respect 
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to \Xi\i E N\ and the set of consistent values {;>:, | w.jX,] = 
Xi, Xi = ft, fl ft,, i E N\. We will prove by mathematical induction that 
t A (w u • • • , w n ) E R[Q]. 

Since {12, | i E iVJ is complete, there exists a tuple s 4 (y lf • • • , y„) 
E #[ft] such that 

s[Xi] = x h Vi E N. 

We know that 

tjd n &] = dixj = iciix,] = xj = s[xj = *[o, n fy], 

which means 

M(Q] n Qi)s. 

Since (fti, fii) is a decomposition pair, there exists a tuple s x E i?[fl] 
such that 

Hence, 

si£(w u y 2 , ■■■ ,y n )<=R[V]. 

Suppose there is a tuple s„_i 4 (w u • • • , u; n _i, y„) E R[Q]. It follows 
that 

*„[«„ n 6„] = t„[X n ] = w n [X n ] = x n = s„_i[X„] = s„-i[n n n ftj. 

Thus there is a tuple s„ E 7?[Q] such that 

^(fi n )s^(fi n )s n _ 1 . 

Hence 

t = s n k (w lf ■ ■ • , w n ) E B[0]. ■ 

It is known that a special class of JD, called acyclic join dependency, 
has many desirable properties; this class makes operations like updates 
and joins especially easy. 15 ' 28 A collection of subsets {Q, | i E N\ of the 
set of attributes ft is called acyclic if all the attributes can be deleted 
by repeatedly applying the following two operations: 15,28 

1. Delete from some ft, an attribute A that appears in no other ft; 

2. Delete one ft, if there is an ft;, i =t j, such that ft, C ft,-. 

A reduction \Yj\j EJQN, and Vi E N - J3j E J such that Y, C 
Yj\ is obtained by removing from \Yt \ i G N\ each Y, that is contained 
in another Yj. 

Definition 12: Let S = (ft,| i E N\ be a collection of subsets of ft. The 
core of S, denoted by S, is defined as follows: 

1. S = 0, for | S | = N = 1 

2. S is the reduction of {ft, D ft, ; | i E N}, for | S | = N > 1. ■ 
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There are many different but equivalent conditions that characterize 
a collection of subsets as acyclic. 15 We will use the following one: 
Lemma 11: A collection S = {ft,| i E N\ of subsets of SI is acyclic iff its 
core S is acyclic. 

Proof: S can be obtained from S by performing the operations 1 and 
2 defined above. It follows that if S is acyclic then S is acyclic and 
vice versa. ■ 

Corollary 5: LetS = (Qt| i € N] be an acyclic coltection of subsets of ft. 
Then |S| > |S|. 

Proof: For | S | = 1, | £ | = 1 1 = 0. For | S | 2» 2, we know that 
| S | ^ | S | . Suppose | S | = | S | . Then any attribute A in S must be 
contained in at least two distinct subsets of S. Let A E ft,- fl ft,. Then 
A E ft, and A E ft,. There is a j '■ ¥= i such that A E ft,. Since ft, C ft, = 
Uk+fok it follows that 

A E ft ; n ft; E S. 

Since | S | = | S | ^ 2, S is not empty. Now, neither operation 1 nor 
2 can be applied to reduce S. From Lemma 11 we know this contradicts 
the assumption that S is acyclic. Thus, | S | > | S |. ■ 

A JD R[Q] = fl[fti] | X | ... | x | R[Q n ], ft = fti • • • ft„, is an acyclic 
join dependency if {ft,|i E N\ is acyclic. A recursive condition for 
acyclic join dependency is as follows: 

Corollary 6: Let R be a relation on the set of attributes ft = ft x • • • ft„. 
Then 

R[Q) = R[Q 1 ]\x\ ••• |X|fl[ft„] 

is an acyclic join dependency iff 

1. (ft,, ft,) is a decomposition pair of R[Q], i = I, • • ■ , n, 

2. #[Xi ••• X m ) = R[Xt] | X | ... | X | R[X m ] is an acyclic join de- 
pendency over the set Xi • • • X m Q ft, where \X, \ i = 1, • • • , m\ is the 
coreof\Sli\iE.N\. 

Proof: The join dependency of a collection of sets and the join de- 
pendency of its reduction are equivalent. 25 The proof easily follows 
from Theorem 12 and Lemma 11. ■ 

The above corollary simply states that acyclic join dependency is 
equivalent to a set of MVDs and EMVDs, i.e., a set of simultaneous 
lattice equations that can be derived recursively. It has been shown by 
hypergraph theory that an acyclic join dependency is equivalent to a 
set of MVDs. 1516 That is, the converse of Lemma 10 is true for acyclic 
join dependency; we will prove that the converse of Lemma 10 is a 
consequence of Corollary 6. 

Theorem 10: Let R be a relation on the set of attributes ft = fti • • • ft„ 
such that (ft, | iEN] is acyclic. Suppose for any N Q N = jl, • • • , n], 
Ni = N- No, and 
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R[a] = R[n No )\x\R[n Ni ]. 

Then 

R[Q] = i?[Oi] | X | ... | x | R[Q n ] 

is an acyclic join dependency. 

Proof: This theorem will be proved by mathematical induction on n. 
For the smallest nontrivial case n = 3, let the core set {X,| i — 1, • • • , 
m\ of [Qi\i = 1, 2, 3} be the reduction of { Y, = fi, n Qi\i = 1, 2, 3}. 
First we want to show that 

R[X X ...X m ] = R[X 1 ]\x\ ... \X\R[X m ). 

We know m < 3 from Corollary 5. There is nothing to be proved if 
m < 2. For m = 2, without loss of generality, let Xi = Y u X 2 = Y 2 , 
and Y 8 C Y 2 = X 2 . Then 

x, n x 2 = y, n y 2 = y, n y 2 y 3 = (y, n y 2 ) u (y, n y 3 ) 
= (fii n n 2 ) u (Qi n q 3 ) = Qi n n 2 fi 3 . 

Since (0 lf ft 2 Q 3 ) is a decomposition pair, 

e(Xi n x 2 ) = d(i2i n fi 2 fi 3 ) £ 0(fli) ° d(n 2 n 3 ). 

Also we have 

Gi 2 Y, = X 1? 



and 
Thus 

and 

It follows that 

Hence 



"2"3 — *2 * 3 — *2 — X 2 . 

0(« 2 n 3 ) ^ 0(X 2 ), 

M,) • 0(fl 2 fi 3 ) £ 0(*i) ■ 0(X 2 ). 

0(X, n x 2 ) c e(Xi) • 0(x 2 ). 



/?[X 1 X 2 ] = i?[X 1 ]|x|^[X 2 ]. 
It follows from Corollary 6 that 

i?[£2] = ft[CJ | X | R[Q 2 ] | X | R[Q 3 ]. 
Suppose the theorem is true for all k < n. Let the core set {X,| i = 1, 
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• • • , m\ of {0,1 * e N] be the reduction of { Y, = 0, n Q,-| i G N). We 
know m < n and for any Mo Q M = {1, • • • , m) and M x = Af - M 
there is anN CN and Ni = N - N such that 

Mo C No, M, C Ni, 

and 

Xm = Yn > Xm, = iNi- 

Then, 

x„ n x Ml = Y No n y„, = u M n Y,) 

u (o, n Of) = n No n q n .. 

Since (tt No , U Nj ) is a decomposition pair, we have 

e(X Mo n x Mi ) = e(n No n n^) c e(u No ) • 0(0*,). 

Also, we know 

Oyv 2 Y/v = Xa/ , 

and 

Oat, 2 Yat, = Xjj/j. 
Thus 

0(O,v o ) ^ 0(Xm o ), 

0(0*,) ^ 0(X„,), 
and 

d(Q No ) • 0(0*,) C 0(Xa, o ) • 0(Xa,,). 

It follows that 

0(X Mo n x Ml ) c d(X Mo ) • 0(X Ml ). 

Hence 

fi[X x • • • X m ] = i?[X Mo ] I X | R[X Ml ] 

for any M CM,M, = M- M . 
Since the theorem is true for m < n, we have 

R[Xi •••X m ]=i?[X 1 ]|x| ... \X\R[X m ]. 

It follows from Corollary 6 that 

Rtflt • • • 0„] = R[Qi] | x | ... \x\R[Q n ]. M 

3196 THE BELL SYSTEM TECHNICAL JOURNAL, DECEMBER 1983 



Further discussion of the properties of acyclic join dependencies can 
be found in Refs. 15 and 16. A linear-time algorithm for testing 
acyclicity is given in Ref. 28. 

VIII. CONCLUSIONS 

We have shown that lattice theory is a powerful tool in the analysis 
of the structure of relational database systems. Using this tool, we 
have established a unified theory of relations. As we have seen, almost 
every concept in the existing relational database theory has a counter- 
part in the lattice theory. This suggests that further study of relations 
should be carried out within the framework of lattice theory. The 
independency theory of lattices, which is a generalization of the 
familiar notion of independency in the geometries, 18,21 is especially 
important and relevant to the structure of relational database systems 
if its relation lattice is modular. This approach may lead to a geometric 
interpretation of data dependencies and independencies, which would 
make the theory more intuitive and also more useful for practical 
application. 

The establishment of this algebraic theory of relational databases is 
done in the same spirit as the construction of probability theory. A 
probability space is a triple (ft, 2, P), where ft is the sample space, 2 
is a tr-algebra of the subsets of ft, and P is a real-valued function, 
called a probability measure, defined on the o--algebra 2. 17,29 The notion 

Table VIII — Comparison of probability theory and 
the theory of relational databases 



Probability Theory 


Theory of Relational 
Databases 


Sample space Q 


Set of attributes fi 


2, the a-Algebra of subsets of 
Q 


2°, the Boolean algebra of sub- 
sets of n 


Probability measure 
P: 2 -» R[0, 1] 


Partition function 
0: 2° -> n[R(Q)] 


a-additivity: 

|X*| is an denumerable union 
of disjoint events 

P(U X k )=l P(X k ) 


Meet-morphism: 
|X*| is a finite collection of 
sets of attributes 

e hj x,] = 0(Xi) • • • 0(X n ) 


P(Q) = 1 
P(0) = 


0(fi) = 

o(0) = l 


o<P(X)<i,vxez 


0£6(X)£ 1,VX£2 


ifxc y, 

P(X)^P(Y) 


IfXD Y, 
0(X) S 0( Y) 


If fli and fl 2 are independent, 

P(«, n n 2 ) = P(n,)P(fl 2 ) 


If Qi and il> are decomposable, 

0(n, n n 2 ) = 0(n,) + o(n 2 ) 
= Mi) ° WW = 0(« 2 ) • *(0i) 
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of a <T-algebra of sets also has an abstract generalizaton, namely it is 
a particular case of a Boolean <r-algebra. 30 A comparison of the alge- 
braic theory of relational databases and probability theory is shown 
in Table VIII. 

We feel that this theory of relational databases can be used to 
analyze the nonquantitative aspects of data dependencies (or indepen- 
dencies), whereas probability theory is the basis of quantitative data 
analysis, namely statistics. This comparison is not meant to imply 
that there is a one-to-one correspondence between the theory of 
relational databases and the theory of probability. Nevertheless, we 
are convinced that the lattice theory could play a role in the theory of 
relational databases similar to the role measure theory plays in the 
theory of probability. 17 

The computational algorithms for meet and join operations of 
partitions are given in Ref. 31, which provides the basic tools for 
future development of algorithms for relations. 

IX. ACKNOWLEDGMENT 

I am grateful to Dr. M. Eisenberg for his careful review of the 
manuscript and his helpful comments and suggestions. 

REFERENCES 

1. E. F. Codd, "A relational model of data for large shared data banks," Comm. ACM, 

3, No. 6 (June 1970), pp. 377-87. 
2 E F. Codd, "Further normalization of the database relational model," in Database 

Systems, R. Rustin, ed., Englewood Cliffs, NJ: Prentice Hall, 1972, pp. 33-64. 

3. C. Beeri, P. A. Bernstein, and N. Goodman, "A sophisticated introduction to 

database normalization theory," Proc. Int. Conf. on Very Large Databases, West 
Berlin, Germany (September 1978), pp. 113-24. 

4. P. A. Bernstein and N. Goodman, What does Boyce-Codd normal form do? Proc. 

6th Int. Conf. on Very Large Databases, Montreal, Canada (1980), pp. 245-59. 

5 P. A. Bernstein, "Synthesizing third normal form relations from functional depend- 
encies," ACM Trans. Database Syst., 1, No. 4 (December 1976), pp. 277-98. 

6. J. D. Oilman, Principles of Database Systems, Rockville, MD: Computer Science 
Press, Inc., 1980. , 

7 R. Fagin, "Multivalued dependencies and a new normal form for relational data- 

bases," ACM Trans. Database Syst., 2, No. 3 (September 1977), pp. 262-78. 

8 C. Zaniolo, "Analysis and design of relational schemata for database systems," Ph.D. 

Diss., Tech. Rep. UCLA-ENG-7661, U. of California, Los Angeles, CA, July, 

1976 - . • , , • , 

9. A. K. Anora, and C. R. Carlson, "The information preserving properties of relational 

database transformations," Proc. 4th Int. Conf. on Very Large Databases, West 

Berlin, Germany (September 1978), pp. 352-9. 

10. J. Rissanen, "Independent components of relations," ACM Trans. Database Syst., 

2, No. 4 (December 1977), pp. 317-25. 

11. J. Rissanen, "Theory of relations for databases— a tutorial survey, Proc. 7th Symp 

Mathematical Foundations of Computer Science, Lecture Notes in Computer 

Science 64, J. Winkowski, ed., Berlin, Heidelberg: Springer- Verlag, 1978, pp. 537- 

51. 
12 W. W. Armstrong, "Dependency structure of database relationships," Proc. IFIP, 

74, Amsterdam: North-Holland Publ. Co. (1979), pp. 580-3. 
13. C. Beeri, R. Fagin, and J. H. Howard, "A complete axiomization for functional and 

multivalued dependencies in database relations," Proc. ACM SIGMOD. Int. Conf. 

on Management of Data, Toronto, Canada (1977), pp. 47-61. 

3198 THE BELL SYSTEM TECHNICAL JOURNAL, DECEMBER 1983 



14. A. V. Aho, C. Beeri, and J. D. Ullman, "The theory of joins in relational databases," 

ACM Trans. Database Syst., 4, No. 3 (September 1979). pp. 297-314. 

15. C. Beeri, R. Fagin, D. Maier, A. Mendelzon, J. D. Ullman, and M. Yannakakis, 

"Properties of acyclic database schemes, Proc. Thirteenth Annual ACM Symp. 
on Theory of Computing (Milwaukee 1981), pp. 355-62. 

16. R. Fagin, A. 0. Mendelzon, and J. D. Ullman, "A simplified universal relation 

assumption and its properties," ACM Trans. Database Syst., 7, No. 3 (September 
1982), pp. 343-360. 

17. M. Loeve, Probability Theory, 3rd ed., Princeton, NJ: Van Nostrand, 1963. 

18. G. Birkhoff, Lattice Theory, 3rd ed., Providence, RI: American Mathematical Society 

Colloquium Publ. XXV, 1967. 

19. C. Delobel and R. G. Casey, "Decomposition of a database and the theory of Boolean 

switching functions," IBM J. Res. and Develop., 17, No. 5 (September 1973). dd. 
374-86. 

20. P. M. Cohen, Algebra, Vol. 2, London: John Wiley, 1977. 

21. A. Rosenfeld, An Introduction to Algebraic Structures, San Francisco, CA: Holden- 

Day, 1968. 

22. H. C. Torng, Switching Circuits Theory and Logic Design, Reading, MA: Addison - 

Wesley Publ. Co., 1972. 

23. T. W. Lin, F. W. Tompa, and T. Kameda, "An improved third normal form for 

relational databases," ACM Trans. Database Syst., 6, No. 2 (June 1981), pp. 329- 
46. 

24. C. Delobel, "Normalization and hierarchical dependencies in the relational data 

model," ACM Trans. Database Syst., 3, No. 3 (September 1978), pp. 201-222. 

25. C. Berri, A. O. Mendelzon, Y. Sagiv, and J. D. Ullman, "Equivalence of relational 

database schemes, Proc. Eleventh Annual ACM Symposium on the Theory of 
Computing (1979), pp. 319-29. 

26. W. W. Armstrong, and C. Delobel, "Decompositions and functional dependencies 

in relations," ACM Trans. Database Syst., 5, No. 4 (December 1980), pp. 404-30. 

27. F. Sadri and J. D. Ullman, "Template dependency: a large class of dependencies is 

relational databases and its complete axiomation," J. ACM, 29, No. 2 (April 
1982), pp. 363-72. 

28. R. E. Tarjan and M. Yannakakis, unpublished work. 

29. A. N. Kolmogorov, Foundation of Probability, New York: Chelsea, 1950. 

30. P. R. Halmos, Lectures on Boolean Algebra, New York: Springer- Verlag, 1974. 

31. T. T. Lee, "Order-preserving representations of the partitions on the finite set," J. 

Combinatorial Theory, Series A.31, No. 2 (September 1981), pp. 136-45. 



APPENDIX A 

Properties of Meet and Join Operations 

In any lattice (L, •, +), the operations of meet and join satisfy the 
following laws: 

LI — a- a = a,a + a = a; (Idempotent) 
L2 — a-b = b-a,a + b = b + a; (Commutative) 
L3—a-(b-c) = (a-b)-c, 

a + (b + c) = (a + b) + c; (Associative) 
L4 — a -(a + b) = a + (a-b) = a; (Absorption) 
L5 — a S 6 iff a-b = a, 

a^b iff a + b = b; (Consistency) 
L6 — b^c implies a-b^a-c 

b^kc implies a + b ^ a + c; (Isotone) 
Ll—a-(b + c)^(a-b) + (a-c) 

a + (b-c) ^ (a + b)-(a + c); (Distributive Inequalities) 
L8— a ^ c implies a + (b-c) ^ (a + b)-c. (Modular Inequality) 
A lattice is called distributive if equality holds in L7 and is called 
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modular if equality holds in L8. A Boolean algebra is a lattice (L, •, 
+, ") with the following additional properties: 30 
L9— a-(b + c) = (a-b) + (a-c), 

a + (b-c) = (a + b) -(a + c); (Distributive Identities) 
L10— a ^ c implies a + (b-c) = (a + b)-c; (Modular Identity) 
Lll— L contains universal bounds 0, 1, which satisfy 

0-a = 0, + a = a, 

l-a = a, 1 + o= 1; 
L12— Va e L, 3d £ L such that 

a-d = 0, a + a= 1, a = a, 

(a!)) = d + 6, (a + fe) = d-6. 



APPENDIX B 

The Proofs of Axioms for Functional and Multivalued Dependencies 

The first three of the following are Armstrong's axioms for func- 
tional dependencies: 12 
Bl. (Reflexivity for functional dependencies) 

If Y C X C ft, then X-> Y. 
Proof: d(X) = 6(Y(X - Y)) = d(Y)6(X - Y) ^ 6(Y). ■ 

B2. (Augmentation for functional dependencies) 

If X ^ Y and Z C ft, then XZ -» YZ. 
Proof: 6(XZ) = d(X)d(Z) ^ d(Y)B(Z) = 6(YZ). ■ 

B3. (Transitivity for functional dependencies) 

If X-> Y and Y-» Z, then X-> Z. 
Proof: 6(X) =i d(Y) and d(Y) =s d(Z) imply d(X) ^ d(Z). ■ 

The next three axioms apply to multivalued dependencies: 13 
B4. (Complementation for multivalued dependencies) 

IfX-»-* YthenX-»->Q-X- Y. 
Proof: 6(X) = d(XY) + 6(XZ) = d(XY) • d(XZ) = 0(XZ) • 6(XY), 
where Z = fl-X-Y. ■ 

B5. (Augmentation for multivalued dependencies) 

If X-»-h> Y, and V C W, then WX-+-> VY. 
Proof: Without loss of generality,* we can let ft = ABCDEFGHIJKL, 
X = ABCDEF, Y = BCGHFI, W = CDEFHIJK, V = EFIJ (see Fig. 
9). Then ft - X - Y = JKL and ft - WX - VY = L. 

We want to show that 

d(ABCDEF) C O(ABCDEFGHI) ■ d(ABCDEFJKL) 



* This proof is carried out in terms of equivalence relations (partitions). It is irrelevant 
here whether an equivalence relation is the image of a single attribute or the image of a 
set of attributes. 
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G 















Fig. 9— Set of attributes fi for B5. 

implies 

d(ABCDEFHIJK) C d(ABCDEFGHIJK) • d(ABCDEFHIJKL). 
Suppose 

t x e{ABCDEFHIJK)t* (1) 

Then 

There exists £3, such that 

Ue{ABCDEFGHI)hd{ABCDEFJKL)tz. (2) 

From (1) and (2), we have 

ta0(JK)t20{JK)ti. 
It follows from (2) that 

tiO(JKG)t 3 . (3) 

Combining (2) and (3), we have 

hdiABCDEFGHIJK)^. (4) 

Relation (2) also implies that 

tid(HI)t 3 . 
From (1), we know 

tMHI)t 2 , 
and therefore 

t 3 6(HI)t 2 . (5) 

It follows from (2) and (5) that 

t 3 d(ABCDEFHIJKL)t 2 . (6) 
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Combining (4) and (6), we have 

tMABCDEFGHIJKMiABCDEFHIJKL)^. 

It follows that 

6(ABCDEFHIJK) C d(ABCDEFGHIJK) • d(ABCDEFHIJKL). ■ 

B6. (Transitivity for multivalued dependencies) 

IfX-^-> Y and Y-h>~>Z, then X-h>->Z- Y. 

Proof: Again, without loss of generality, we can let Q = ABCDEFGH, 
X = AFGH, Y = BCFG, Z = CDGH (see Fig. 10). Then Z - Y = DH, 
Sl-XY=DE,n-YZ = AE,n- X(Z - Y) = BCE. 
We want to show that 



and 



imply 



6(AFGH) C 6{ADEFGH) • d(ABCFGH) 
O(BCFG) C d(ABCEFG) • 6{BCDFGH) 



d(AFGH) C d(ADFGH) • 6(ABCEFGH). 
Suppose tid(AFGH)t 2 . Then there exists t 3 such that 
Ud(ADEFGH)h6(ABCFGH)t 2 . 

Since t 2 6(BCFG)t 3 , there exists £ 4 such that 

t 2 d(ABCEFG)t 4 6(BCDFGH)t 3 . 

It follows that 

^(AFG^AFG^AFG)^. 
From (7) and (8), we have 

ti6(DH)M(DK)t* 



A 






E 
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D 
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G 
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(7) 



(8) 



(9) 



(10) 



Fig. 10— Set of attributes « for B6. 
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Combining (9) and (10) yields 

tMADFGH)U. (11) 

From (7) and (8), we have 

Ud(H)t 3 d(H)t 2 . 

It follows from (8) that 

Ud(ABCEFGH)t 2 . (12) 

Relations (11) and (12) yield 

ti6{ADFGH)U0{ABCEFGH)t 2 . 

Hence 

d(AFGH) C d(ADFGH) • d(ABCEFGH). ■ 

The last two axioms relate functional and multivalued dependencies. 

B7. IfX^ YthenX— ^ Y. 

Proof: Let Z = fi - XY. We want to show that 

d(X) ^ 6(Y) implies 6(X) C 0(XY) • (XZ). 

Suppose M(X)fe. Since 0(X) ^ 0(Y) implies 0(XY) = 0(X), then 
£i0(XY)£ 2 . It follows that 



t x 6{XY)t 2 d(XZ)t 2 . 



Hence 



0(X) C o(XY) ' 0(XZ). M 

B8. If X > > Y, Z C Y, and for some W disjoint from Y, we have 

W^Z,thenX-*Z. 

Proof: Again, without loss of generality, we can let ft — ABCDEFGH, 
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Fig. 11— Set of attributes Q for B8. 
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X = ACEF, Y = EFGH, Z = FG, and W = AB (see Fig. 11). Then 12 - 

XY = BD. 

We want to show that 

d(ACEF) C d(ACEFGH) • d(ABCDEF) 



and 

imply 



0(AB) ^ 0(FG) 



d(ACEF) ^ 6(FG). 
Suppose tid(ACEF)T 2 . Then there exists t 3 such that 
t l B(ACEFGH)t^{ABCDEF)t l . 

Since 

tMFG)t 3 and t z 6(AB)t % 

we have 

ti$(FG)k and t 3 d(FG)t 2 , 

and thus 

t x 6{FG)t 2 . 

Hence 

6(ACEF) S 0{FG). ■ 
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